AITopics | synthetic voice

Collaborating Authors

synthetic voice

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MND left her without a voice. Eight seconds of scratchy audio gave it back to her

BBC NewsAug-20-2025, 04:51:38 GMT

MND left her without a voice. After such a long time, I couldn't really remember my voice, Sarah Ezekiel tells BBC Access All. When I first heard it again, I felt like crying. The onset of motor neurone disease (MND) left Sarah without a voice and the use of her hands at the age of 34. It was within months of her becoming a mum for the second time.

eye-gaze technology, mnd, scratchy audio, (14 more...)

BBC News

Country:

South America (0.15)
North America > Central America (0.15)
Oceania > Australia (0.05)
(14 more...)

Industry:

Government (0.96)
Health & Medicine > Therapeutic Area > Rheumatology (0.35)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.35)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Robots (0.49)

Add feedback

Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices

Székely, Éva, Miniota, Jūra, Míša, null, Hejná, null

arXiv.org Artificial IntelligenceJun-10-2025

The growing prevalence of conversational voice interfaces, powered by developments in both speech and language technologies, raises important questions about their influence on human communication. While written communication can signal identity through lexical and stylistic choices, voice-based interactions inherently amplify socioindexical elements - such as accent, intonation, and speech style - which more prominently convey social identity and group affiliation. There is evidence that even passive media such as television is likely to influence the audience's linguistic patterns. Unlike passive media, conversational AI is interactive, creating a more immersive and reciprocal dynamic that holds a greater potential to impact how individuals speak in everyday interactions. Such heightened influence can be expected to arise from phenomena such as acoustic-prosodic entrainment and linguistic accommodation, which occur naturally during interaction and enable users to adapt their speech patterns in response to the system. While this phenomenon is still emerging, its potential societal impact could provide organisations, movements, and brands with a subtle yet powerful avenue for shaping and controlling public perception and social identity. We argue that the socioindexical influence of AI-generated speech warrants attention and should become a focus of interdisciplinary research, leveraging new and existing methodologies and technologies to better understand its implications.

artificial intelligence, chatbot, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.1065

Country: Europe (0.46)

Genre: Research Report (1.00)

Industry: Social Sector (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.47)

Add feedback

Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology

Moell, Birger, Aronsson, Fredrik Sand

arXiv.org Artificial IntelligenceMar-3-2025

This study explores voice cloning to generate synthetic speech replicating the unique patterns of individuals with dysarthria. Using the TORGO dataset, we address data scarcity and privacy challenges in speech-language pathology. Our contributions include demonstrating that voice cloning preserves dysarthric speech characteristics, analyzing differences between real and synthetic data, and discussing implications for diagnostics, rehabilitation, and communication. We cloned voices from dysarthric and control speakers using a commercial platform, ensuring gender-matched synthetic voices. A licensed speech-language pathologist (SLP) evaluated a subset for dysarthria, speaker gender, and synthetic indicators. The SLP correctly identified dysarthria in all cases and speaker gender in 95% but misclassified 30% of synthetic samples as real, indicating high realism. Our results suggest synthetic speech effectively captures disordered characteristics and that voice cloning has advanced to produce high-quality data resembling real speech, even to trained professionals. This has critical implications for healthcare, where synthetic data can mitigate data scarcity, protect privacy, and enhance AI-driven diagnostics. By enabling the creation of diverse, high-quality speech datasets, voice cloning can improve generalizable models, personalize therapy, and advance assistive technologies for dysarthria. We publicly release our synthetic dataset to foster further research and collaboration, aiming to develop robust models that improve patient outcomes in speech-language pathology.

dysarthria, speech, synthetic data, (13 more...)

arXiv.org Artificial Intelligence

2503.01266

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model

Lehečka, Jan, Hanzlíček, Zdeněk, Matoušek, Jindřich, Tihelka, Daniel

arXiv.org Artificial IntelligenceJul-24-2024

In this paper, we experimented with the SpeechT5 model pre-trained on large-scale datasets. We pre-trained the foundation model from scratch and fine-tuned it on a large-scale robust multi-speaker text-to-speech (TTS) task. We tested the model capabilities in a zero- and few-shot scenario. Based on two listening tests, we evaluated the synthetic audio quality and the similarity of how synthetic voices resemble real voices. Our results showed that the SpeechT5 model can generate a synthetic voice for any speaker using only one minute of the target speaker's data. We successfully demonstrated the high quality and similarity of our synthetic voices on publicly known Czech politicians and celebrities.

dataset, speech, speecht5 model, (17 more...)

arXiv.org Artificial Intelligence

2407.17167

Country:

Europe > Czechia (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Media (0.69)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AI Can't Make Music

The Atlantic - TechnologyJul-22-2024, 10:47:00 GMT

The first concert I bought tickets to after the pandemic subsided was a performance of the British singer-songwriter Birdy, held last April in Belgium. I've listened to Birdy more than to any other artist; her voice has pulled me through the hardest and happiest stretches of my life. I know every lyric to nearly every song in her discography, but that night Birdy's voice had the same effect as the first time I'd listened to her, through beat-up headphones connected to an iPod over a decade ago--a physical shudder, as if a hand had reached across time and grazed me, somehow, just beneath the skin. Countless people around the world have their own version of this ineffable connection, with Taylor Swift, perhaps, or the Beatles, Bob Marley, or Metallica. My feelings about Birdy's music were powerful enough to propel me across the Atlantic, just as tens of thousands of people flocked to the Sphere to see Phish earlier this year, or some 400,000 went to Woodstock in 1969.

artist, music, musician, (16 more...)

The Atlantic - Technology

Country:

Europe > Belgium (0.25)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.42)

Add feedback

Scarlett Johansson Says OpenAI Ripped Off Her Voice for ChatGPT

WIREDMay-20-2024, 23:38:57 GMT

Last week OpenAI revealed a new conversational interface for ChatGPT with an expressive synthetic voice strikingly similar to that of the AI assistant played by Scarlett Johansson in the sci-fi movie Her--only to suddenly disable the new voice over the weekend. On Monday, Johansson issued a statement claiming to have forced that reversal, after her lawyers demanded OpenAI clarify how the new voice was created. Johansson's statement, relayed to WIRED by her publicist, claims that OpenAI CEO Sam Altman asked her last September to provide ChatGPT's new voice but that she declined. She describes being astounded to see the company demo a new voice for ChatGPT last week that sounded like her anyway. "When I heard the release demo I was shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference," the statement reads.

large language model, machine learning, natural language, (14 more...)

WIRED

Country: North America > United States > New Hampshire (0.06)

Industry:

Law (0.73)
Media > Film (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations

Pinhanez, Claudio, Fernandez, Raul, Grave, Marcelo, Nogima, Julio, Hoory, Ron

arXiv.org Artificial IntelligenceMar-17-2024

This poses challenges for applications interested in targeting specific demographics (e.g., an African American business or NGO; a voice-tutoring system for children that are not of White ethnicity, etc.). The ultimate goal of the project described in this paper is to provide to designers, developers, and enterprises the choice of having a professional voice which is clearly recognizable as African American, and therefore more able to address diversity and inclusiveness issues. Being more precise, our goal is to create an African American Text-to-Speech system, which we will refer simply as an African American voice or AA voice, able to produce synthetic audio segments from standard English texts, and which will be recognized by African American speakers and non-speakers as sounding like a native African American speaker. The AA voice should exhibit a level of technical quality similar to the Standard American English (SAE) synthetic voices currently available through professional platforms. The evaluation of the technical quality of the AA voice, however, is not addressed in this paper, which focuses primarily on whether the AA voice can be recognized as sounding like an African American speaker. Linguists [27, 28] have described a continuum of dialects under what is often termed African American Vernacular English (AAVE). At one end of the spectrum, one finds the largest deviation from SAE in terms of lexicon (including slang), syntax and morphology, and phonological/phonetic properties. At the other end, AAVE speakers begin to approach SAE in terms of lexicon and grammar but still retain marked speech characteristics (primarily in terms of intonation, phonation, and vowel placement [14, 28]) which grant the speech a distinctive identity which listeners use as cues in the perception of African American English [44].

aa voice, african american, participant, (14 more...)

arXiv.org Artificial Intelligence

2403.11209

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > South Carolina > Greenville County > Greenville (0.06)
North America > United States > New York > New York County > New York City (0.04)
(31 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
(2 more...)

Add feedback

Meta unveils Voicebox AI: Should we all be worried?

FOX NewsJul-7-2023, 18:14:30 GMT

Meta's latest artificial intelligence model called Voicebox is a customized text to speech product that can mimic any specific voice of your choosing.

artificial intelligence, machine learning, synthetic voice, (15 more...)

FOX News

Country:

Asia > China (0.05)
North America > United States > California > San Mateo County > Menlo Park (0.05)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.74)
Information Technology > Artificial Intelligence > Speech (0.71)

Add feedback

'Of course it's disturbing': will AI change Hollywood forever?

#artificialintelligenceMar-23-2023, 06:25:13 GMT

What will AI (artificial intelligence) do to Hollywood? Who better to answer that question than ChatGPT, a thrilling but scary chatbot developed by OpenAI. When the Guardian asked it about AI's potential impact on the film industry, it made the following points: Scriptwriting: AI can be used to analyze existing screenplays and create new ones, potentially leading to more efficient and cost-effective screenwriting. Pre-production: AI can be used to streamline the pre-production process, including casting, location scouting and storyboarding. Special effects: AI can be used to create more realistic and immersive special effects, potentially reducing the need for practical effects and saving time and money in post-production.

ai change hollywood forever, mankiewicz, revolution, (13 more...)

#artificialintelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.05)
Europe > Russia > North Caucasian Federal District > Chechen Republic (0.05)
North America > United States > New York (0.04)
(4 more...)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

Is It All Just Hype? Why AI Voiceover Might Just Be a Nothingburger After All - J. Michael Collins

#artificialintelligenceMar-2-2023, 17:28:33 GMT

The number of voice actors in a rabid panic over AI in the industry is reaching a head, with social media brimming with daily posts on the topic, despite very little real world evidence of synthetic voices impacting the bottom line of working pros, or even amateurs for that matter. There's a supposition among the masses that because the technology is improving, its ascension is inevitable, and that by definition it will supplant human voice actors to a highly disruptive degree. It's easy to get caught up in the terror, but worst-case scenarios….heck, Now, there's no question that numerous companies and platforms want AI voiceover to be an Earth-shattering thing. And, inevitably, we are going to start seeing even well-known casting platforms offer AI voices against or alongside their human talent. Many voice actors are busy creating their own voice clones which they expect to make available through their websites, casting platforms, or through the platforms of the companies creating these artificial voices for them.

michael collin, platform, synthetic voice, (8 more...)

#artificialintelligence

Country: North America > United States (0.40)

Industry:

Government > Space Agency (0.40)
Government > Regional Government > North America Government > United States Government (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback